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Abstract  Clusters  of  material  at  the  ocean  surface  have  been  frequently  observed.  Such  accumulations 
of  material  play  an  important  role  in  a  variety  of  applications,  from  biology  to  pollution  mitigation.  Identify¬ 
ing  where  clusters  will  form  can  aid  in  locating,  for  example,  hotspots  of  biological  activity  or  regions  of 
high  pollutant  concentration.  Here  cluster  strength  is  introduced  as  a  new  metric  for  defining  clusters  when 
all  particle  positions  are  known.  To  diagnose  regions  likely  to  contain  clusters  without  the  need  to  integrate 
millions  of  particle  trajectories,  we  propose  to  use  dilation,  which  quantifies  area  changes  of  Lagrangian 
patches.  Material  deformation  is  decomposed  into  dilation  and  area-preserving  stretch  processes  to  refine 
previous  approaches  based  on  finite-time  Lyapunov  exponents  (FTLE)  by  splitting  the  FTLE  into  fundamen¬ 
tal  kinematic  properties.  The  concepts  are  developed  theoretically  and  illustrated  in  the  context  of  a  state- 
of-the-art  data-assimilating  predictive  ocean  model  of  the  Gulf  of  Mexico.  Regions  of  dilation  less  than  one 
are  shown  to  be  much  more  likely  (6  times  more  likely  in  the  given  example)  to  be  visited  by  particles  than 
those  of  dilation  greater  than  one.  While  the  relationship  is  nonlinear,  dilation  and  cluster  strength  exhibit  a 
fairly  good  correlation.  In  contrast,  both  stretch  and  Eulerian  divergence  are  found  to  be  uncorrelated  with 
cluster  strength.  Thus,  dilation  maps  can  be  used  as  guides  for  identifying  cluster  locations,  while  saving 
some  of  the  computational  cost  of  trajectory  integrations. 


1.  Introduction 

The  nonuniform  distribution  of  material  at  the  ocean  surface  has  long  been  cause  for  fascination  [e.g.,  Day 
and  Shaw,  1987;  Yoder  et  al.,  1994;  Toner  et  at.,  2003;  Lebreton  et  a!.,  2012].  The  much  documented  "garbage 
patch"  in  the  Pacific  Ocean  may  be  the  most  prominent  example  of  this  phenomenon  [Kaiser,  2010;  Howell 
et  al.,  2012].  More  recently,  the  abundant  availability  of  satellite  imagery  confirms  that  cluster  patterns  in 
the  distribution  of  materials  such  as  chlorophyll  or  oil  are  not  isolated  curiosities  but  commonplace  in  many 
parts  of  the  world's  oceans.  Figure  1  shows  some  examples.  Here  "cluster"  is  used  loosely  as  an  accumula¬ 
tion  of  material  with  higher  density  than  in  surrounding  areas;  a  rigorous  definition  is  provided  in  section  3. 

To  a  good  approximation,  ocean  flows  are  incompressible.  Thus,  there  is  no  obvious  mechanism  for  uni¬ 
formly  distributed,  neutrally  buoyant  particles  to  cluster.  However,  floating  particles  experience  the  conver¬ 
gence  and  divergence  in  the  two-dimensional  flow  field  resulting  from  downwelling  and  upwelling.  The 
modeling  study  of  Zhong  et  al.  [2012]  showed  that  buoyant  particles  advected  with  the  ocean  flows  prefer¬ 
entially  occupy  locations  with  instantaneous  convergence.  This  result  is  intuitive  for  stationary  or  slowly 
evolving  flows,  where  the  Lagrangian  time  scales  associated  with  particle  motion  are  much  smaller  than  the 
Eulerian  time  scales  associated  with  flow  evolution. 

When  submesoscale  flows  are  considered,  where  the  Lagrangian  and  Eulerian  time  scales  may  be  compara¬ 
ble,  the  instantaneous  state  of  the  velocity  field  may  be  less  useful  as  a  predictor  of  cluster  locations.  Zhong 
et  al.  [2012]  in  fact  did  not  consider  the  predictive  power  of  the  divergence  field  for  particle  accumulation 
patterns. 

One  approach  to  describing  transport  patterns,  sometimes  thought  of  as  "highways"  and  "parking  lots"  for 
advected  material,  that  has  become  widespread  over  the  last  two  decades  is  the  mapping  of  Lagrangian  coher¬ 
ent  structures  (LCS).  These  are  frequently  taken  as  ridges  in  maps  of  the  finite-time  or  finite-space  Lyapunov 
exponents  (FTLE  and  FSLE,  respectively)  [e.g.,  Boffetta  et  al.,  2001;  d'Ovidio  et  al.,  2004;  Shadden  et  al.,  2005; 
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Figure  1 .  (top  left)  A  cluster  of  surface  drifters  and  Sargassum  along  a  front  in  the  northeastern  Gulf  of  Mexico.  (Image  courtesy  Tamay  Ozgokmen.)  (bottom  left)  Accumulating  oil  from 
the  Deepwater  Horizon  oil  spill  in  the  Gulf  of  Mexico.  (Photo  by  Daniel  Beltra.)  (top  right)  Patch  of  sargassum  off  the  east  coast  of  U.S.,  as  observed  by  Envisat.  (Image  courtesy  ESA.)  (bot¬ 
tom  right)  Streaks  of  sargassum  in  the  western  Gulf  of  Mexico,  as  observed  by  MERIS.  (Image  courtesy  ESA.) 


Rypina  et  al.,  2010;  Harrison  and  Clatzmaier,  2012],  although  other  techniques  have  been  suggested  [Mendoza 
and  Mancho,  201 0;  Mezic  et  al.,  201 0;  Haller  and  Beron-Vera,  2012].  LCS  are  used  to  identify  flow  regime  bounda¬ 
ries  and  areas  of  large  deformation  [Samelson,  2013].  Generally,  they  are  expected  to  be  more  robust  to  model 
error  than  individual  or  even  groups  of  trajectory  forecasts  [ Haller ,  2002;  Hernandez-Carrasco  et  al.,  2011;  Huntley 
et  al.,  2011].  LCS  have  been  found  to  be  useful  in  the  context  of  pollution  mitigation  in  the  ocean  in  that  they 
can  indicate  preferred  pathways  and  areas  of  decreased  predictability  [Coulliette  et  al.,  2007;  Huntley  et  al.,  2011; 
Olascoaga  and  Haller,  2012]. 

Nevertheless,  it  remains  unclear  whether  LCS  are  an  adequate  diagnostic  for  clusters.  Their  clear  advantage 
over  Eulerian  quantities  such  as  divergence  for  this  purpose  is  that  they  capture  a  particle's  or  patch's  his¬ 
tory.  Accumulation  after  all  is  not  an  instantaneous  process.  LCS  characterize  net  deformation  properties, 
and  deformation  can  be  separated  into  area  changes  and  area-preserving  stretch  (see  section  4).  Much  of 
the  theory  underlying  LCS  has  been  developed  in  the  context  of  divergence-free  flows  [Haller  and  Yuan, 
2000;  Boffetta  et  al.,  2001;  Haller,  2001;  Mancho  et  al.,  2006].  Yet  without  divergence,  particle  densities  can¬ 
not  change  (except  due  to  finite  sampling  phenomena)  and  clusters  will  not  form.  It  is  thus  natural  to  look 
for  alternatives  that  capture  a  particle's  history  of  divergence  but  separate  out  stretch  effects.  A  logical  can¬ 
didate  is  the  along-track  integral  of  divergence.  This  is  closely  related  to  what  we  will  call  "dilation"  (see  pre¬ 
cise  definition  in  section  4).  Integrated  over  a  patch,  dilation  gives  the  ratio  of  final  to  initial  patch  area. 

This  study  seeks  to  show  how  patterns  in  dilation  relate  to  patterns  visible  in  the  distribution  of  advected 
particles.  Its  advantages  over  both  FTLE  and  instantaneous  divergence  as  a  diagnostic  for  clusters  are 
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demonstrated.  To  do  so,  more  than  a  million  floating  particles  are  simulated  in  a  state-of-the-art  regional 
ocean  model.  However,  to  move  beyond  a  purely  visual  comparison,  a  metric  is  also  needed  to  identify  clus¬ 
ters  and  to  differentiate  clusters  of  greater  and  lesser  density.  Counting  particles  in  grid  boxes  is  a  common 
approach  [Schumacher  and  Eckhardt,  2002;  Boffetta  et  at.,  2004;  Cressman  et  at.,  2004],  but  suffers  from  its 
dependence  on  a  grid  and,  in  particular,  the  orientation  of  the  grid.  Thus,  we  introduce  a  grid-independent 
cluster  strength  function  as  an  alternative. 

The  model  experiment  is  briefly  described  in  section  2,  followed  by  a  discussion  of  how  to  define  clusters 
and  clustering  rigorously  in  section  3.  Section  4  contains  the  theoretical  development  of  dilation,  placing  it 
into  the  context  of  other  diagnostics.  Numerical  details  are  relegated  to  Appendix  A.  Section  5  presents  the 
analysis  of  the  diagnostics  for  material  accumulation  regions.  A  summary  and  conclusions  are  provided  in 
section  6. 


2.  Model  Description 

To  illustrate  the  concepts  of  clustering  and  dilation  and  to  explore  their  relationship  in  a  realistic  ocean  set¬ 
ting,  we  make  use  of  the  Navy  Coastal  Ocean  Model  (NCOM)  in  its  regional  configuration  maintained  at  the 
Naval  Research  Laboratory  [ Barron  et  a!.,  2006].  The  domain  consists  of  the  entire  Gulf  of  Mexico  and 
extends  to  the  eastern  shore  of  Florida  and  south  of  the  Yucatan  Strait,  ranging  from  98°W  to  79°W 
and  from  18°N  to  31°N.  The  boundary  conditions  are  supplied  by  a  global  implementation  of  NCOM  run 
with  1/8°  resolution  [Barron  et  at.,  2007].  The  horizontal  resolution  is  roughly  3  km.  There  are  50  vertical 
layers,  with  1 6  z-levels  below  550  m  and  34  rr-levels  above  that.  The  surface  layer,  which  is  used  for  the  anal¬ 
yses  presented  here,  has  a  nominal  thickness  of  0.5  m.  Surface  boundary  forcing  is  taken  from  the  Coupled 
Ocean/Atmosphere  Mesoscale  Prediction  System  (COAMPS)  [Hodur  et  at.,  2002;  Hodur,  1997].  River  input  is 
simulated  through  climatological  monthly  means,  which  are  set  at  the  center  of  each  month  and  then  inter¬ 
polated  in  time  for  other  days. 

The  particular  model  run  used  here  for  the  summer  of  2012  was  initialized  on  1  May  2012.  Sea  surface 
height  and  temperature  observations  from  satellites,  as  well  as  any  available  in  situ  observations  are 
assimilated  using  a  3DVar  approach  in  the  Navy  Coupled  Ocean  Data  Assimilation  (NCODA)  system 
[Cummings,  2005],  The  model  was  run  in  real  time,  so  that  data  latency  may  have  impacted  the 
analyses.  Model  output  was  stored  at  3  h  intervals  and  is  publicly  available  under  doi  number  10.7266/ 
N72Z13F4. 

The  example  used  throughout  this  paper  focuses  on  a  group  of  roughly  1.4  million  particles  launched  on  a 
1  km  grid  extending  from  95°W  to  82°W  and  from  20°N  to  the  coastline  near  30°N  on  15  July  2012,  00:00. 
These  were  advected  by  the  modeled  flow  for  10  days  using  a  fourth-order  Runge-Kutta  scheme  with  a 
10  min  time  step.  The  analysis  is  performed  on  the  region  between  91  °W  and  86°W  and  between  24°N 
and  28°N. 


The  experiment  was  repeated  using  the  geostrophic  surface  velocities  implied  by  the  model's  sea  surface 
height  field  VP.  These  were  computed  using  an  f-plane  approximation  and  the  equations 


u 


9 


f  dy  ’ 


(1) 


gd  T 
f  dx  ’ 


(2) 


where  g  is  the  gravitational  constant  and  f  is  a  mean  value  of  the  Coriolis  parameter  for  the  domain. 


3.  What  is  Clustering?  An  Objective  Definition 

Clustering  is  visually  easily  identified  (Figure  1),  but  to  our  knowledge,  lacks  a  standard  rigorous  definition. 
In  some  sense,  a  cluster  is  an  accumulation  of  particles  or  material.  The  process  of  clustering  is  then  the  for¬ 
mation  of  clusters.  There  are  various  mathematical  techniques  for  identifying  clusters  in  a  data  set,  including 
hierarchical  clustering  [Johnson,  1967],  expectation  maximization  clustering  [Dempster  et  al.,  1977],  and 
/(-means  clustering  [Lloyd,  1982].  The  last  of  these  has  previously  been  successfully  employed  for  identifying 
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drifter  clusters  to  derive  oceanographic  flow  properties  [ Koszalka  and  LaCasce,  2010;  Koszalka  et  ah,  2011]. 
Many  of  these  techniques  rely  on  parameters  that  define  maximum  spacing  within  a  cluster  or  the  target 
number  of  clusters.  In  the  context  of  particles  aggregating  on  the  ocean  surface,  these  parameters  are  diffi¬ 
cult  to  set  a  priori.  Moreover,  it  is  also  desirable  to  assign  a  degree  of  clustering  to  every  particle  that  differ¬ 
entiates  small  and  large  clusters,  but  also  dense  and  loose  clusters. 

One  possible  cluster  metric  is  the  particle  density,  defined  as  the  count  of  particles  in  each  box  of  a  grid 
overlaid  on  the  domain  of  interest,  as  mentioned  above.  The  resolution  of  the  resulting  particle  density  field 
is  necessarily  limited  to  the  chosen  grid  resolution,  and  if  the  grid  is  refined  too  much,  the  particle  density 
becomes  binary.  Moreover,  the  orientation  of  the  grid,  either  aligning  with  cluster  lines  or  not,  can  influence 
the  result.  Particle  density  is  restricted  to  integer  values,  sacrificing  the  nuance  available  to  continuous  met¬ 
rics.  Most  importantly  for  the  purposes  here,  it  is  insensitive  to  clusters  extending  beyond  a  grid  box  or  the 
density  of  a  cluster  within  a  grid  box. 

Another  metric  is  the  inverse  of  the  area  of  Voronoi  cells  [Monchaux  et  at.,  2010;  Fiabane  et  al.,  2012].  A 
Voronoi  diagram  divides  a  domain  into  cells,  such  that  each  cell  contains  one  particle  and  all  the  points 
closer  to  this  particle  than  to  any  other.  It  avoids  the  need  for  a  predefined  length  scale  or  grid  dependence. 
On  the  other  hand,  this  method  cannot  distinguish  between  particles  surrounded  by  a  few  and  those  sur¬ 
rounded  by  many  nearby  particles. 

To  overcome  some  of  these  shortcomings,  we  propose  here  the  cluster  strength  C  defined  for  any  location 
x*  as 


(3) 


where 


dn  |*n  tT  j , 


(4) 


is  the  distance  from  the  location  x*  to  the  nth  particle  at  xn,  and  L  is  a  tunable  length-scale  parameter.  One 
can  interpret  C(x*)  as  a  weighted  sum  of  all  particles,  where  the  weights  are  a  function  of  the  distance  from 
x*.  The  shape  of  the  weighting  function,  i.e.,  the  discount  rate  for  far-away  particles,  is  determined  by  L.  A 
reasonable  choice  for  L,  and  the  one  used  in  the  analysis  below,  is  half  the  mean  spacing  of  the  particles  if 
they  were  distributed  uniformly. 

Like  all  other  metrics,  C  does  have  some  undesirable  properties.  In  this  case,  they  are  related  to  edge  effects 
on  a  finite,  nonperiodic  domain.  On  such  a  domain,  even  equally  spaced  particles  on  a  grid  will  yield  a  non- 
uniform  cluster  strength  field:  in  a  strip  along  the  boundaries,  with  a  length  scale  of  2 L,  the  cluster  strength 
is  reduced.  One  way  to  avoid  these  biases  is  to  extend  the  computational  domain  beyond  the  analysis 
domain.  This  must  be  done  anyway  to  ensure  an  inflow  of  particles  through  open  boundaries.  Otherwise, 
artificial  voids  are  created  that  bias  the  results. 

It  should  also  be  noted  that  the  cluster  strength  C  is  not  strictly  conserved.  Each  particle's  influence  extends 
to  infinity  (although  it  is  small  outside  a  radius  of  2 L).  As  particles  travel  closer  to  or  farther  from  the  bound¬ 
ary  of  the  domain,  different  portions  of  the  Gaussian  weight  fall  inside  the  domain.  On  the  other  hand,  par¬ 
ticles  that  have  crossed  out  of  the  analysis  domain  but  remain  within  the  computational  domain  continue 
to  be  counted,  unlike  for  particle  density  defined  by  grid  counts. 

While  the  analysis  of  this  study  is  restricted  to  finite  collections  of  discrete  particles,  the  concept  of  cluster 
strength  can  easily  be  extended  to  a  continuous  tracer  with  concentration  p.  Denoting  the  domain  by  £2, 
the  cluster  strength  is  given  by 


C(x*)=  p(x)e-Wx)'L?  dx, 


(5) 


Q. 


where 


d(x)  =  |x-x*|, 


(6) 


is  the  distance  from  x*  to  x. 
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Figure  2.  (a)  Cluster  strength  on  25  July  2012  00:00,  computed  on  a  2  km  grid  from  particles  launched  on  a  1  km  grid  10  days  before,  (b)  The  same  with  particle  positions  overlaid  in  red. 


The  main  advantages  of  cluster  strength  as  a  metric  are:  (1)  It  is  independent  of  any  grid.  (2)  It  differentiates 
degrees  of  aggregation.  (3)  Its  only  parameter  can  be  determined  a  priori  based  on  the  length  scales  of 
interest.  Like  all  clustering  metrics,  the  cluster  strength  is  a  function  of  the  particle  positions.  Thus,  the 
initial  positions  and  integration  length  for  the  particle  trajectories  will  critically  affect  the  cluster  strength 
field. 

An  illustration  of  the  application  of  cluster  strength  is  given  in  Figure  2.  Given  the  ~1 .4  million  trajectories 
from  the  main  experiment  outlined  in  section  2,  C  was  computed  at  the  end  of  the  1 0  day  integration  on  a 
2  km  grid,  nested  inside  the  domain  used  for  the  trajectory  calculations,  using  a  length  scale  of  L  =  0.5  km. 
The  ridges  in  cluster  strength  shown  in  Figure  2a  clearly  coincide  with  the  visually  identifiable  "clusters"  of 
particles  shown  in  Figure  2b. 

Now  we  can  rigorously  and  objectively  define  a  "cluster"  as  a  region  of  locally  higher  cluster  strength  C,  and 
"clustering"  as  the  increasing  of  cluster  strength  over  time. 

While  many  observations  of  cluster  patterns  exist,  especially  from  satellite  and  airborne  images  (e.g., 
Figure  1),  it  is  well  established,  theoretically  [Sawford,  2001,  and  references  therein],  numerically  [Poje  et  a!., 
2010],  and  observationally  [Poje  et  at.,  2014],  that,  on  average,  floating  particles  tend  to  disperse.  This  may 
appear  as  a  contradiction.  Flowever,  the  rearrangement  of  initially  uniformly  distributed  particles  into  a  dis¬ 
tribution  with  significant  peaks  in  C  also  requires  the  creation  of  voids,  or  areas  with  locally  lower  cluster 
strength.  The  eye  is  drawn  to  the  clusters,  but  the  particle  pair  statistics  are  heavily  influenced  by  the  large 
separations  at  the  virtually  unbounded  upper  end  of  the  distribution. 

Cressman  et  al.  [2004]  illustrated  this  statistical  phenomenon  nicely  by  showing  the  evolving  probability  dis¬ 
tribution  of  particle  separations,  starting  from  an  initially  uniform  distribution,  in  a  numerical  model  of  a  3-D 
turbulent  flow  (see  their  Figure  11).  A  similar  analysis  for  the  model  used  here  shows  similar  results 
(Figure  3).  At  launch,  the  probability  distribution  is  a  delta  function  centered  at  1  km  (Figure  3a).  Over  time, 
it  flattens  out,  with  both  the  mean  and  the  median  increasing  (Figure  3c),  consistent  with  relative  dispersion 
in  the  average  behavior.  Simultaneously,  the  probabilities  of  low  separation  distances  increase  as  well 
(Figure  3b),  which  reflects  the  appearance  of  clusters. 

Note  that  the  identity  of  close-by  pairs  changes.  For  example,  while  the  total  number  of  pairs  closer  than 
575  m  remains  almost  constant  between  3  and  10  days  of  integration,  only  28%  of  the  pairs  in  this  group 
after  3  days  remain  in  it  after  10  days.  Thus,  pairs  approach  each  other,  separate,  and  come  closer  again. 
This  indicates  that  clusters  form,  breakup,  and  reform  with  new  constituents  as  well. 
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Figure  3.  (a  and  b)  Probability  distribution  of  pair  separations  over  time.  Each  colored  line  corresponds  to  a  particular  integration  time.  Pairs  are  defined  on  the  original  grid,  whereby 
each  particle  is  partnered  with  the  nearest  particle  to  the  north  of  it.  Probabilities  are  computed  using  bins  with  100  m  width.  Some  separation  distances  at  the  later  times  exceed 
600  km;  for  clarity,  two  excerpts  of  the  full  distribution  are  shown,  (c)  Time  series  of  the  mean  and  median  separation  distances. 


4.  Dilation,  Stretch,  and  Finite-Time  Lyapunov  Exponents:  Definitions  and  Theory 

Material  accumulation  is  a  form  of  deformation  of  the  initial  material  patch.  There  are  three  types  of  defor¬ 
mation  of  a  2-D  patch:  deformation  that  retains  the  area  and  shape  of  the  patch  (rotation),  deformation 
that  changes  the  area  of  the  patch  but  retains  its  shape  (dilation),  and  that  which  keeps  the  area  of  the 
patch  constant  but  changes  its  shape  (stretch).  Of  course,  they  usually  act  together  resulting  in  both  area 
and  shape  changes.  The  filamentation  observed  in  images  such  as  Figures  1  and  2b  generally  results  from  a 
combination.  A  change  in  density  must  be  caused  by  dilation,  but  the  resulting  clusters  are  also  stretched 
into  complex  shapes  and  rotated  into  different  orientations.  Orientation  is  not  of  concern  in  the  present 
context;  so  rotation  will  not  be  considered  further. 

To  make  these  notions  more  precise,  consider  the  flow  given  by 

x=u(x,t)  xeflcR2,te[to,f],  (7) 

where  u  is  a  smooth  velocity  field  and  O  is  an  open  subset  of  R2.  (The  concepts  below  can  be  extended  to 
3-D,  but  this  is  not  considered  here.)  The  trajectory  of  a  particle  with  initial  position  xQ  at  time  fQ  is  then 
given  by 


x(f;fo,x0)=x0  + 


u(x,  r)dr. 


Material  deformation  is  captured  by  the  deformation  tensor 


(8) 
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F  is  a  linearization  of  the  flow  map  in  xa.  Its  effects  on  an  infinitesimal  fluid  patch  can  be  described  by  its 
singular  value  decomposition.  Under  a  linear  map,  a  circle  is  mapped  into  an  ellipse,  and  the  singular  values 
/q  >  provide  the  lengths  of  the  semimajor  and  the  semiminor  axes  of  the  image  ellipse,  respectively. 
The  area  of  the  ellipse  is  n/q  ju2.  In  fact,  in  general,  for  any  nonlinear  flow,  the  area  of  a  deformed  patch  £2, 
evolved  from  Qa,  is 


A  = 

dx= 

. 

. 

Q 

, 

|det  (F)\  dx0 


si„ 


/h  A  2  dxa. 


(10) 


Thus,  we  define  dilation  as  the  product  of  the  singular  values  of  F: 

S=fi^i2.  (11) 


Shape  changes  under  a  linear  map  can  be  summarized  by  the  eccentricity  or  the  ellipticity  of  the  image 
ellipse.  Both  of  these  are  functions  of  the  ratio  of  the  major  to  the  minor  axis  lengths.  Generalizing  this  con¬ 
cept  to  an  arbitrary  flow,  we  define  stretch  as  the  ratio  of  the  singular  values  of  F\ 

<r=  — .  (12) 


Both  dilation  and  stretch  are  functions  of  t,  and  the  longer  a  patch  in  the  flow  evolves,  the  more  it  is 
expected  to  deform.  Consequently,  in  some  cases,  it  is  desirable  to  consider  dilation  and  stretch  rates.  These 
are  defined,  respectively,  as 


A_  log 


(13) 


Z= 


log  N 

t 


(14) 


As  mentioned  above,  a  popular  approach  to  describing  transport  patterns  considers  ridges  in  the  finite- 
time  Lyapunov  exponent  (FTLE)  field.  The  FTLE  captures  relative  dispersion  behavior  of  particles  as  an  expo¬ 
nential  separation  rate.  It  is  defined  as  a  function  of  the  leading  singular  value  of  F: 


FTLE=l°i M. 

t 


(15) 


From  equations  (1-15),  it  follows  that 


FTLE  = 


A+X 

2 


(16) 


The  separation  rate  is  the  average  of  the  dilation  and  the  stretch  rates.  Dilation  and  stretch,  thus,  constitute 
a  decomposition  of  the  FTLE,  which  permits  separating  processes  resulting  in  area  changes  from  those 
resulting  in  shape  changes. 

Dilation  also  has  an  interesting  relationship  with  Eulerian  divergence  V  ■  u.  It  can  be  shown  that  the  mate¬ 
rial  derivative  of  dilation  satisfies 


D<5 

Df 


=(5  V  ■  u, 


(17) 


from  which  it  follows  that 


<5(f) =exp 


V  ■  u\x( 


]  dx 


where  the  integral  is  taken  along  trajectories.  Along  with  equation  (13),  this  yields  the  result  that 


(18) 
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Figure  4.  (top)  Sea  surface  height  anomaly  on  20  July  2012  00:00  with  (a)  the  full  velocity  field  and  (b)  the  geostrophic  velocity  field,  (bottom)  Positions  of  particles  launched  on  a  1  km 
grid  on  1 5  July  201 2  00:00,  after  1 0  days  of  integration  of  (c)  the  full  velocity  field  and  (d)  the  geostrophic  velocity  field. 


V  ■  u[x( t),  t]  d t, 


i.e.,  the  dilation  rate  is  the  average  divergence  experienced  by  a  particle  along  its  trajectory. 


(19) 


5.  Cluster  Diagnostics 

Above  we  stated  that  material  accumulation  requires  dilation  less  than  1  and  hence  nonzero  divergence.  If 
V  ■  u= 0,  then  by  equation  (18),  <5  =  1  and  by  equation  (10),  the  area  of  any  patch  stays  constant.  Under 
these  conditions,  material  cannot  accumulate.  Particle  clustering  has  previously  been  reported  in  incom¬ 
pressible  flows,  but  this  is  somewhat  misleading,  since  either  a  random  component  is  added  to  the  incom¬ 
pressible  velocity  field  [ Klyatskin  and  Saichev,  1997]  or  inertial  particles  are  modeled  [Boffetta  et  at.,  2004; 
Fiabane  et  a!.,  201 2].  In  either  case,  a  compressible  part  is  introduced  into  the  particles'  Lagrangian  velocity. 
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(b) 


Figure  5.  Time  series  of  (a)  max  (C)  and  (b)  mean(C)  for  full  and  geostrophic  velocities  on  the  domain  shown  in  Figure  4,  computed  from  the  particles  launched  on  a  1  km  grid  on  1 5 
July  2012  00:00. 


In  the  context  of  ocean  flows,  the  lack  of  clustering  in  divergence-free  flows  can  be  verified  by  consid¬ 
ering  the  geostrophic  velocity  field.  The  geostrophic  flow  can  be  estimated  reasonably  easily  from  syn¬ 
optic  sea  surface  height  measurements  and  temperature  and  salinity  profiles,  unlike  the  full  flow 
field.  In  many  applications,  the  geostrophic  flow  is  considered  a  good  approximation  to  the  full  flow 
field,  and  it  has  been  shown  to  yield  adequate  estimates  of  the  stretch  deformation  patterns  [Olas- 
coaga  et  al.,  2013;  Haller  and  Beron-Vera,  2013].  Given  the  theoretical  considerations  above,  however,  it 
is  not  surprising  that  the  geostrophic  approximation  to  the  flow  field  fails  to  capture  the  clustering 
behavior. 

Figure  4  illustrates  this  point.  The  top  plots  compare  the  full  and  the  geostrophic  velocities  at  one  snapshot 
in  time,  while  the  bottom  plots  show  the  particle  distributions  after  10  days.  Particles  were  initialized  on  a 
uniform  grid  and  advected  by  the  respective  velocity  field. 

The  differences  in  the  velocities  are  barely  noticeable  to  the  eye.  Yet  the  particle  distributions  have  little  in 
common.  In  particular,  the  geostrophic  velocities  do  not  lead  to  cluster  formation.  This  is  visually  obvious 
and  can  be  verified  quantitatively:  The  maximum  cluster  strength  achieved  with  the  geostrophic  approxi¬ 
mation  is  7.5,  whereas  with  the  full  velocity  field  it  is  71.2.  While  only  0.01%  of  grid  points  exhibit  a  cluster 
strength  above  5  after  geostrophic  advection,  1.78%  of  grid  points  do  after  full  advection.  Less  than  2% 
may  not  seem  like  a  large  number,  but  clusters  do  not  dominate  the  domain;  they  are  isolated,  small,  but 
dense  structures. 

The  maximum  achieved  cluster  strength  in  the  domain  of  interest  is  more  generally  useful  for  sum¬ 
marizing  the  clustering  character  of  a  set  of  particles  quantitatively.  Figure  5a  shows  that  it  is  increasing 
throughout  the  10  day  integration  period  for  the  example  from  Figure  4,  if  the  full  velocity  field  is  used.  It 
increases  slightly  from  its  initial  value  even  in  the  divergence-free  flow,  as  the  initially  gridded  particles  rear¬ 
range  themselves  into  a  more  random  distribution. 

The  mean  cluster  strength  (Figure  5b)  remains  more  or  less  constant  throughout  the  integration  time.  There 
is  a  small  adjustment  within  the  first  day  for  both  cases,  after  which  the  full  velocity  integration  exhibits  a 
very  slight  increase.  Recall  that  cluster  strength  is  not  strictly  conserved,  mainly  due  to  edge  effects.  Figure 
5b  confirms  that  these  effects  remain  negligible. 

In  section  3,  the  probability  distributions  of  particle  pair  separations  were  shown  to  be  consistent  with 
relative  dispersion  on  average  coexisting  with  cluster  formation.  Particle  pair  statistics,  however,  are 
restricted  to  original  neighbors  and  are  hence  limited  in  their  ability  to  diagnose  clusters.  These  statis¬ 
tics  for  particles  in  the  nonclustering  geostrophic  flow  follow  similar  patterns  as  those  for  particles  in 
the  clustering  full  flow:  the  initial  delta  function  centered  at  1  km  flattens  out,  with  probabilities  at 
both  extremes  of  the  spectrum  increasing  over  time,  even  though  clustering  is  not  observed.  The  distri¬ 
butions  do  differ,  as  can  be  seen  in  Figure  6,  with  pair  separations  more  likely  to  be  small  in  the  full 
flow.  Flowever,  these  small  differences  hardly  indicate  the  large  differences  in  clustering  behavior,  as 
seen  in  Figures  4  and  5a. 
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Dilation  rate  has  the  advantage  that  it 
is  computed  for  individual  particles, 
without  reliance  on  neighbors  (see 
Appendix  A).  The  dilation  rate  field  can 
also  be  computed  on  a  grid.  Just  as  for 
the  FTLE  field,  this  is  accomplished  by 
integrating  trajectories  initialized  at  the 
grid  locations  backward  in  time. 
Figure  7a  shows  the  gridded  dilation 
rate  corresponding  to  the  particle  dis¬ 
tribution  shown  in  Figure  4c.  Visually, 
the  patterns  delineated  by  the 
blue  areas  in  Figure  7a  are  very  similar 
to  those  traced  out  by  the  particles 
themselves  in  Figure  4c  and  those  dis¬ 
cernible  in  the  cluster  strength  in 
Figure  2. 

Figures  7b  and  7c  show  the  stretch 
rate  and  FTLE,  respectively.  All  three 
fields  exhibit  similar  patterns.  Thus,  one 
might  be  tempted  to  conclude  that 
any  of  these  three  quantities  can  serve 
equally  well  as  a  cluster  diagnostic. 
That  this  is  not  the  case  becomes 
obvious  when  the  geostrophic 
velocity  field  is  used  for  the  integration. 
Figure  8  shows  the  FTLE  for  the  geo¬ 
strophic  case.  Since  dilation  rate  is  the 
path-averaged  divergence  (equation 
(19)),  it  is  zero  up  to  numerical  error  for 
the  geostrophic  velocity  field.  Hence 
also,  stretch  rate  for  the  geostrophic 
case  is  simply  twice  the  FTLE. 
The  field  exhibits  strong  ridges,  similar 
to  those  in  Figure  7,  even  though  there 
is  no  discernible  clustering  (Figure  4d). 

Moreover,  the  similarity  among  the  dilation  rate,  stretch  rate,  and  FTLE  fields  in  Figure  7  is  not  as 
strong  as  it  appears:  note  that  the  correlation  between  dilation  rate  and  stretch  rate  is  merely  -0.04 
over  the  pictured  domain.  Since  stretch  generally  has  a  stronger  signal  than  dilation  in  this  region, 
with  greater  variance,  the  variability  in  the  FTLE  (recall  that  it  is  the  average  of  dilation  rate  and  stretch 
rate)  is  dominated  by  the  stretch  rate  component:  the  correlation  with  stretch  rate  is  0.88,  while  that 
with  dilation  rate  is  0.44. 

The  theory  presented  in  section  4  suggests  that  clusters  should  be  associated  with  high  negative  dilation 
rates,  i.e.,  particles  cluster  if  they  experience  large  convergence  on  average  over  their  trajectory.  In  the  com¬ 
putational  example,  the  correlation  between  cluster  strength  and  dilation  rate  is  -0.57.  In  comparison,  the 
correlations  of  cluster  strength  with  FTLE  and  stretch  rate  are  -0.13  and  0.16,  respectively. 

Clearly,  dilation  rate  correlates  much  better  with  cluster  strength  than  either  of  the  other  two  metrics.  None¬ 
theless,  -0.57  is  not  a  particularly  outstanding  correlation.  This  can  be  explained  in  part  by  the  nonlinear 
relationship  between  cluster  strength  and  dilation  rate,  as  shown  in  Figure  9. 

That  figure  also  suggests  that  positive  dilation  rates  are  much  more  prevalent  than  negative  ones,  at  least 
when  considering  the  entire  domain  sampled  at  grid  points.  In  fact,  71%  of  grid  points  have  positive  dilation 
rates.  This  is  another  piece  of  evidence  that  relative  dispersion  easily  coexists  with  clustering. 


(a) 


time  since  launch  (days) 

(b) 


Figure  6.  (a)  Cumulative  probability  distributions  of  pair  separations  after  10  days 
for  full  (solid)  and  geostrophic  (dashed)  velocity  fields.  Pairs  are  defined  on  the 
original  grid,  whereby  each  particle  is  partnered  with  the  nearest  particle  to  the 
north  of  it.  Probabilities  are  computed  using  bins  with  100  m  width.  The  insert 
shows  the  excerpt  at  the  lower  end  of  the  scale,  with  the  initial  separation  dis¬ 
tance  of  1  km  marked  as  a  dotted  line,  (b)  Time  series  of  the  difference  between 
the  cumulative  probability  distributions  (full  minus  geostrophic)  at  1  km. 
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(b) 


Figure  7.  (a)  Dilation  rate,  (b)  stretch  rate,  and  (c)  FTLE  for  the  time  period  15  July  2012  00:00  to  25  July  2012  00:00,  computed  on  a  1  km  grid. 


If  particles  preferentially  assemble  in  regions  of  high  negative  dilation  rate,  then  the  dilation  rates  sampled 
by  the  particles  should  have  a  different  distribution  than  those  sampled  at  grid  points.  This  is  indeed  the 
case.  For  the  example  at  hand,  72%  of  the  particles  launched  on  1 5  July  201 2  on  the  1  km  grid  have  experi¬ 
enced  total  negative  dilation  by  25  July  2012.  Thus,  negative  dilation  rate  actually  has  predictive  power  for 
particle  positions.  In  other  words,  areas  exhibiting  negative  dilation  rate  (here  29%)  are  more  likely  (here 
more  than  6  times  as  likely)  to  be  visited  by  particles  than  those  with  a  positive  dilation  rate. 

Dilation  rate  is  not  cheap  to  compute.  So  the  question  arises  whether  one  can  get  away  without  integrating 
divergence  along  trajectories  (see  Appendix  A  for  the  actual  algorithm)  and  simply  use  instantaneous  diver¬ 
gence,  as  suggested  by  Zhong  et  at.  [2012].  Figure  10  displays  the  divergence  field  at  the  end  of  the  integra¬ 
tion  time  period,  25  July  2012.  It  is  not  as  smooth  as  the  Lagrangian  fields,  but  it  does  contain  the  strong 
linear  feature  in  the  southwest  quadrant  of  the  displayed  domain.  In  this  domain,  51%  of  the  area  experien¬ 
ces  negative  divergence.  Of  the  particles  launched  on  15  July  2012  that  were  in  this  domain  on  25  July 
2012,  58%  land  in  regions  of  negative  divergence.  So  there  is  indeed  a  small  sampling  bias  toward 
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Identifying  regions  of  clustering  in  the 
ocean  is  of  interest  for  various  reasons: 
these  regions  often  exhibit  interesting 
dynamics.  They  are  biologically  impor¬ 
tant,  as  they  can  aggregate  nutrients. 
For  pollution  response,  they  are  critical, 
as  they  indicate  where  the  clean-up 
can  be  most  efficient.  In  this  work,  we 
have  provided  a  rigorous  definition  of 
"clustering,"  along  with  a  metric  to 
determine  a  degree  of  clustering.  We  have  posited  a  predictive  relationship  between  dilation  and  clustering 
strength,  based  on  theory.  Within  the  context  of  a  numerical  example  in  a  realistic  ocean  model,  this  rela¬ 
tionship  was  shown  to  hold.  While  advected  material  preferentially  occupies  regions  of  negative  dilation 
rate,  most  of  the  model  ocean  exhibits  positive  dilation  rate,  consistent  with  the  well-established  average 
dispersion  behavior. 


Figure  8.  FTLE  for  the  time  period  1 5  July  201 2  00:00  to  25  July  2012  00:00,  com¬ 
puted  on  a  1  km  grid,  using  geostrophic  velocities. 


convergence,  but  not  nearly  as  strong 
as  for  dilation  rate.  The  predictive 
power  of  divergence  for  particle  posi¬ 
tions  is  thus  small.  Nor  does  divergence 
correlate  well  with  cluster  strength:  the 
correlation  is  -0.09.  Clearly,  the  com¬ 
putational  shortcut  of  ignoring  a  par¬ 
ticle's  past  does  not  lead  to  satisfactory 
results  for  integration  periods  this  long. 

6.  Conclusions 


For  the  aggregation  of  material,  convergence  is  clearly  critical.  Clustering,  however,  is  an  integrative  process, 
and  thus  dilation,  the  path-integrated  divergence,  is  more  appropriate  as  a  diagnostic  than  instantaneous 
Eulerian  divergence. 


Figure  9.  (a)  Scatterplot  of  cluster  strength  as  a  function  of  dilation  rate  computed  for  the  10  day  integration  from  15  July  2012  to  25  July 
2012  on  a  1  km  grid  over  the  domain  shown  in  Figure  4.  (b)  Two-dimensional  histogram  of  the  same  data  (with  cluster  strength  capped  at 
10),  expressed  as  a  percentage  of  the  total  number  of  samples. 
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Maps  of  dilation  provide  an  excellent 
guide  to  where  clusters  of  advected 
particles,  that  were  initially  uniformly 
spaced,  will  be  found.  Nonetheless,  it  is 
important  to  remember  that  the 
locations  of  the  clusters  are  not  neces¬ 
sarily  places  of  local  convergence. 
Aggregated  material  continues  to  be 
advected,  stretched  into  different 
shapes,  pulled  apart,  and  reassembled. 
What  makes  dilation  such  a  powerful 
diagnostic  is  its  ability  to  capture  the 
cumulative  effects  of  these  processes 
on  cluster  positioning. 

The  most  straightforward  method  for 
identifying  cluster  locations  developing 
from  a  uniform  distribution  in  a  model 
is  to  simply  advect  a  large  number  of 
particles.  To  compute  dilation,  the 
divergence  field  has  to  be  integrated 
along  trajectories.  So  what  is  gained  by 
adding  this  additional  step  to  the  cal¬ 
culation?  The  main  advantage  is  that  a  much  coarser  grid  of  trajectories  can  be  used,  requiring  far  fewer 
integrations,  because  a  single  along-trajectory  integration  provides  the  dilation  at  that  point,  whereas  clus¬ 
ter  strength  requires  a  whole  field  of  particles  to  be  meaningful.  Moreover,  by  employing  integrations  back¬ 
ward  in  time,  dilation  can  be  computed  for  a  target  area  without  seeding  a  much  larger  region  that  may  or 
may  not  contribute  to  clustering  in  the  target  area.  Such  backward  integrations  cannot  be  leveraged  for 
direct  particle  advections.  (This  latter  advantage  becomes  less  relevant  in  the  context  of  clusters  evolving 
from  initially  non-uniformly  distributed  material,  a  case  not  presented  here.) 

While  the  aggregation  process  is  most  relevant  for  cluster  identification,  deformation  also  includes  stretch¬ 
ing  processes.  Here  we  have  only  made  preliminary  inquiries  into  the  relationship  between  stretch  and  dila¬ 
tion.  The  patterns  exhibited  in  the  numerical  example  for  the  two  fields  are  visually  remarkably  similar,  yet 
show  very  low  correlation.  This  raises  interesting  questions  about  the  connections  between  the  two  fields, 
both  statistically  and  physically.  Strong  lines  of  stretching  may,  for  example,  also  be  associated  with  strong 
dilation;  if  the  relationship  is  nonlinear,  correlation  is  not  the  best  metric  to  tease  this  out.  The  correlation 
may  also  be  scale  dependent  or  require  a  spatial  lag.  Future  work  will  delve  further  into  these  topics. 

A  shortcoming  of  dilation  is  that  it  provides  no  insight  into  the  physical  processes  responsible  for  the  cluster 
formation.  In  fact,  its  strength  of  reflecting  cumulative  effects,  which  is  critical  for  a  diagnostic,  makes  the 
identification  of  these  challenging.  These  aspects  are  addressed  by  Jacobs  et  at.  [2015]. 

Most  observed  clusters  originate  from  nonuniformly  distributed  material.  Advection  is  not  the  only  process 
responsible  for  forming  clusters;  biology  and  chemistry  also  have  a  role  to  play.  These  aspects  have  been 
neglected  in  the  analysis  presented  here.  How  effective  dilation  maps  can  be  in  tracing  accumulation 
regions  of  nonuniformly  initialized  material  remains  a  subject  of  active  inquiry. 

Appendix  A:  Computing  Dilation  and  Related  Quantities 

Dilation  is  the  product  of  the  singular  values  of  the  deformation  tensor  F  (see  equation  (11)).  Since  F  is  a  2 
X  2  matrix  (see  equation  (9)),  the  singular  values  can  be  derived  using  an  analytic  formula.  Thus,  the  pri¬ 
mary  challenge  is  to  compute  the  deformation  tensor  itself.  Traditionally,  this  has  been  achieved  using  a 
finite  difference  algorithm  for  each  of  the  partial  derivatives  [Lekien  et  at.,  2005;  Branicki  and  Wiggins,  201 0]. 
The  problem  with  this  approach  is  that  a  fairly  dense  grid  of  trajectories  is  needed  for  an  accurate  approxi¬ 
mation.  Alternatively,  each  point  on  a  coarser  grid  requires  the  integration  of  four  additional  trajectories  at 
subgrid  spacing  [ Beron-Vera  et  at.,  2008]. 
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This  increased  computational  cost  can  be  avoided  by  rewriting  the  problem  as  an  integral: 


(A1) 


which  follows  directly  from  equation  (8).  By  the  chain  rule,  this  then  becomes 


(A2) 


(A3) 
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Now  the  components  of  the  deformation  tensor  can  be  computed  using  numerical  quadrature.  The  integra¬ 
tion  is  performed  along  the  trajectory,  which  is  computed  first  by  integrating  the  path  equation  (8).  For  the 
calculations  used  here,  we  chose  a  fixed-step  fourth-order  Runge-Kutta  scheme  with  a  10  min  time  step  for 
both  integrations.  Velocities  and  velocity  gradients  are  interpolated  trilinearly,  in  space  and  time,  to  the  par¬ 
ticle  positions.  Velocity  gradients  are  approximated  from  gridded  velocities  using  the  second-order  cen¬ 
tered  finite  difference  scheme. 

The  product  of  the  singular  values  of  a  square  matrix  is  also  identical  to  the  absolute  value  of  its  determinant. 
Thus,  we  compute  dilation  as  the  determinant  of  F.  To  compute  the  FTLE,  we  find  the  leading  singular  value 
using  the  analytic  formula  for  a  2  X  2  matrix.  There  are  well-known  challenges  with  establishing  the  smaller  sin¬ 
gular  value  accurately  using  the  analytic  expression,  since  it  involves  subtracting  two  terms  that  may  both  be 
large,  but  whose  difference  may  be  small  and  hence  subject  to  substantial  numerical  error.  Instead,  we  divide 
the  determinant  by  the  leading  singular  value,  an  approach  that  is  more  robust  to  numerical  error. 
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